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We compute spectra of sample auto-covariance matrices of second order stationary stochastic 
processes. We look at a limit in which both the matrix dimension TV and the sample size M used 
to define empirical averages diverge, with their ratio a = N/M kept fixed. We find a remarkable 
scaling relation which expresses the spectral density p(A) of sample auto-covariance matrices for 
processes with dynamical correlations as a continuous superposition of appropriately rescaled copies 
of the spectral density p& (A) for a sequence of uncorrelated random variables. The rescaling factors 
are given by the Fourier transform C(q) of the auto-covariance function of the stochastic process. 
We also obtain a closed-form approximation for the scaling function p„ : (A). This depends on the 
shape parameter a, but is otherwise universal: it is independent of the details of the underlying 
random variables, provided only they have finite variance. Our results are corroborated by numerical 
simulations using auto-regressive processes. 
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The present investigation concerns spectral properties 
of sample auto-covariance matrices derived from time 
series, and in particular the way they are affected by 
finite sample fluctuations. Having a theory that would 
quantify such effects analytically would clearly be use- 
ful for e.g. the empirical analysis of stochastic pro- 
cesses. However, such theoretical understanding is at 
present almost entirely lacking - in marked contrast to 
the situation for the closely related problem of sample 
covariance matrices of a multi-dimensional data set 
estimated from finitely many independent measure- 
ments. 

From an abstract point of view, both problem classes 
belong to random matrix theory [HQ- In the case 
of sample covariance matrices, the random matrix en- 
semble in question is the well known Wishart-Laguerre 
ensemble [3] , which has been widely studied for several 
decades, and for which numerous results are available. 
The spectral problem for example was solved in the 
60s by Marcenko and Pastur 4] ; typical fluctuations of 
the largest eigenvalue of Wishart matrices were shown 
[f| to follow a Tracy- Widom distribution , and large 
deviation properties of both the largest [7[ and small- 
est Q eigenvalue have recently been characterized. 
Numerous variants of the original Wishart-Laguerre 
ensemble have been studied in the literature over the 
years (e.g. [§-[l2j]), and applications formulated in a 
variety of fields, including multivariate statistics (l 31 ] . 
wireless communication 14J and the analysis of cross- 
correlations in financial data [i^flli. For a more com- 
plete recent overview, we refer to |2|. 

Due to the temporal structure of the underlying sig- 
nals in the problem of sample auto-covariance matri- 
ces of time series, the ensemble of random matrices 
describing this problem is radically different from the 
Wishart-Laguerre ensemble, and much less is known 



about their spectral properties. The existence of the 
limiting spectral density of sample auto-covariance 
matrices of moving-average processes [13] with i.i.d. 
driving (of both finite and infinite order) has in fact 
been established only very recently [l8j]; the corre- 
sponding existence proof for the closely related prob- 
lem of random Toeplitz matrices with i.i.d. entries is 
also only a few years old |19j . We are not aware of 
closed form expressions for limiting spectral densities 
for these cases - whether exact, or approximate but 
of a quality that would allow meaningful use for e.g. 
time series analysis. The purpose of the present letter 
is to report recent progress that fills this gap. 

We consider stationary zero-mean processes (x n ) n &- 
These could be discrete-time processes to begin with, 
or sampled from continuous-time processes at dis- 
crete equidistant time steps At, in which 
x(nAr). We are interested in the spectrum of N x N 
empirical auto-covariance matrices C, which are esti- 
mated by measurements on sequences of M samples. 
There are several (non-equivalent) ways to define the 
elements of C. Our choice is 



M-l 



-jTj ^2 x m+k x m+ e , l< k,£< N 



(1) 



Sample auto-covariance matrices C of this form con- 
stitute randomly perturbed Toeplitz matrices |20j . 
Note that our choice differs from the ones in |18J, 
which are simpler for being constructed as random 
Toeplitz matrices from the start. 

Our main results are the following. We find a re- 
markable scaling relation which expresses the spectral 
density p(X) of sample auto-covariance matrices for 
processes with dynamical correlations as a continuous 
superposition of rescaled copies of the spectral den- 



2 



sity pa (A) for a sequence of uncorrelated, i.i.d. ran- 
dom variables. We also obtain a simple closed form 
expression for that provides an excellent approx- 
imation to numerically simulated spectra. 

The spectral density of a matrix C is evaluated in 
terms of its resolvent as 

pjv(A;C) = ^ImTr [A e l-C] _1 . (2) 

Here 1 is the N x N unit matrix and A e = A — ie, the 
limit e — > + being understood. We follow Edwards 
and Jones [2l[ and express the trace of the resolvent 
in terms of a Gaussian integral as 

2 9 1 

Piv(A) = - - limlm — — (lnZ N ) , (3) 



with 



N , . 

Y[ r^— exp | - u k (X £ 5ki - Cki)ue^ 

(4) 

and angled brackets indicating an ensemble average. 
This average can be evaluated using replicas. Anal- 
ogous calculations in random matrix theory [2 J| sug- 
gest that the final results will exhibit the structure 
of a replica-symmetric high-temperature solution, and 
hence that an annealed calculation (which replaces 
(IyiZn) by ln(Zjv) in (|3])) will provide exact results. 
This is the approach we adopt here. 

Inserting the definition Eq. (JTJ) into Eq. (g]), one notes 
that Zn depends on the disorder, i.e. on the {x n }, 
only through the M variables 



1 N 

j= x. l+k u k , < i < M . 



(5) 



fc=i 



Assuming that the true auto-covariance C(k) = 
{x n x n +k) is absolutely summable, we can argue from 
the central limit theorem (CLT) for weakly dependent 
random variables that the z, will be correlated Gaus- 
sian variables with (z^ = 0, and covariance matrix Q 
whose elements are given in terms of C as 



k( 



The disorder average (. . .) is thus a Gaussian integral, 
which can be performed to give 



{-5*-E<4 

k 

-lndet(l-iaQ)} . (7) 



The matrix Q being Toeplitz, we will use Szego's theo- 
rem 20] to evaluate the 'spectral sum' lndet(l— iaQ) 



in Eq. ([7]). Given that our sequence of Q-matrices 
doesn't fully fit the assumptions of the standard the- 
ory in that matrix elements are themselves dependent 
on the dimension M, we expect this to be only an ap- 
proximation; it should, however, become exact in the 
limit a — > 0. 

To proceed, we need Fourier representations of Q, and 
we will have to keep keep track of finite-M, finite- TV 
expressions in what follows. Assuming M to be odd, 
we have 



(M-X)/2 

l - M £ ' 

|U=-(iW-l)/2 



(8) 



for the ({uk} dependent) elements of Q, with 



U k U( 



kf 



C(q^\u(q^\ 2 = Q(q^ 



(9) 



where = |f p and u(q^) = J2k=i e'^Ufc- Here 



(M-l)/2 

C(q,) = Yl C(n)^ 
n=-(j\/-l)/2 



(10) 



is the Fourier transform of the true auto-covariance 
function of the underlying process. Truncating the 
sum in Eq. (fTOj) at \n\ < (M — l)/2 will create neg- 
ligible errors in the large M limit if J2^=-ao \C(n)\ 
exists, as already required when appealing to the CLT 
for the Zi statistics above. Restricting the g M values 
to the discrete grid with spacing 2tt/M approximates 
Q by its cyclified version. In Szego's terminology, the 
matrix Q has Q M = Q(q^) as its (M-grid) symbol, and 
Szego's approximation for the spectral sum reads 

(M-l)/2 

lndet(l-iaQ) ~ ^ In (l - iaQ M ) . (11) 

M=-(M-l)/2 

The symmetry C(n) — C(—n) entails C(q^) = 
C{— q^), thus Q{q^) = Q{~q^)- Next, one extracts 
the {u k } dependence (via the {Q^}) from the evalua- 
tion of (|11[) . using ^-functions and their Fourier repre- 
sentations. The {u k } integrals then become Gaussian, 
and we can express (Zn) as 



Z N ) = 



(M-l)/2 - (M-l)/2 

ndQf+dQ^ ( ^ 
^- — - exp 



2tt 



iQ f_iQ fi 



(M-l)/2 



ln(l - iaQp) - \ lndet(A e l - (12) 



The elements of the N x N matrix R in (fT2"]) are given 
by Rm = wEiV 1)/2 Q^(^)cos(^(fc-£)), with 
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with 



FIG. 1: (Colour online) Spectral density for sample auto 
covariance matrices of i.i.d. signals x n ~ A/"(0, 1) at a = 0.1 
(green); analytic approximation Eq. ()24|) for p„ (A) (red). 
The Marcenko-Pastur law (blue-dashed) for the same a is 
also shown for comparison. 



dQ M e-^O,. _ J a -i e -<3 M /a . Q p > o , 



2tt 1 — iaQp 







else 



After rescaling Q M /a — >• Q M this yields 

(£j\r) = (exp{ - ilndct(A £ I-i?)}^ } (13) 



with now 



(Af-l)/2 



^ = m 51 cos(g p (fc - £)) . (14) 

In Eq. (Tilfl) we have introduced the short-hand 

roo (M-l)/2 



}= f II {d^e" 4 " }(•••) (15) 

J ° ^=0 



for the Q^-integrals. As the notation indicates, these 
amount to averages over exponentially distributed 
random variables of unit mean. Hence within our 
Szego-approximation, the original spectral problem 
for sample auto-covariance matrices C is equivalent 
to that for random Toeplitz matrices R given by ([T4")) . 

To make progress we use the fact that the matrices R, 
too, are Toeplitz, and approximate the spectral sum 
lndet(A s l — R) appearing in (fT3"|) in terms of Szego's 
theorem, 

(N-l)/2 

lndet(A £ i-i?) ~ ^ In (A £ - R v ) , (16) 

i/=-(JV-l)/2 



(M-l)/2 



(N-l)/2 

E 

n=-(JV-l)/2;<r=±l 



= E E 

ju=0 
(M-l)/2 



e i(p„+a-g (i )n 



(17) 



for p v = i/ defined on a grid of spacing 2ir/N, and 
the S-kernel given by 



5 ^ - ^ E 



sin(iV(p„ + gg M )/2) 
T =±i sin ((^ + ct «m)/ 2 ) 



(18) 



Next one extracts the Q/x dependence from the spec- 
tral sum (|16p by enforcing the ^-definitions using 
(5-functions and their Fourier representations. This 
enables one to perform the R v integrals using residues 
much as in the case of the integrals above, giving 



1 < k, £ < N. We have combined modes with /i and 
— // and neglected as subleading the fact that the re- 
sulting prefactors differ for the fj. = mode. 

We use residues to evaluate the integrals in (fT2|) : with 



/(JV-l)/2 

(zn)=( n f v 

\ v=0 



(19) 



F v =i / die- a l At ^M l, "^%)M . (20) 
Jo 

The coupling via the S'-kernel entails that the F v for 
different v are correlated. To proceed, we exploit the 
property that the S'-kernel is rapidly oscillating, and 
sharply peaked at \p v ± q^\ ~ 0(l/N). The domi- 
nant contributions to the exponential in (|20[) at fixed 
v therefore lie in the interval I v — {fi : \v ± a/J,\ < 1}. 
Approximating the S-kernel by a rectangular window 
(of height a/2) on /„ and using smoothness of C(q^) 
on the g^-scale, we set 

(M-l)/2 



li=0 



As the I v are overlapping, the F v in (|2T))) remain cor- 
related. As a last step we ignore these residual corre- 
lations and substitute y — aR u C{q l _ L ) /2 in Eq. (|20|) to 
arrive at a closed form approximation for (Zn): 

t=o \<xC(p v )Ja (l-iy) 2/ J 

For the spectral density ([3]) in the thermodynamic 
limit N — > oo we then get 

2 9 1 

^ (A) = -nl^o Im dX^LN ln{ZN) 

*-7^p2°(aA«)) (23) 

7T C(q) 
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FIG. 2: (Colour online) Logarithmic spectral densities of 
auto-covariance matrices for an AR2 process with ai = 1/2 
and ci2 = 5/16, comparing the scaling prediction Eq. (1 231) 
using the empirical scaling function (see text, black full 
curve) with that based on the analytic approximation (T22| 
for the scaling function (red full curve), and simulations 
(green dashed curve) Double-peaked set of curves: a — 
0.1, single-peaked set of curves: a = 0.8. 



in which 



5 W(A) = -lim il m jLlnlJl\ £ 

" " CM \ <> 



i-0 7T 



(24) 



with I a obtained from 
T-function: for Imx < 



in terms of an incomplete 



I a (x) 



dye- ivx (l-iy) 



-2/o 



= i (- X y 1+2/a e~ x T(l - 2/a, -x) . (25) 



Note that the scaling function pi ' (A) has an inde- 
pendent meaning as the spectral density of the em- 
pirical auto-covariance matrix (at the same value of 
the shape parameter a) for a sequence of uncorrelated 
data, for which C(q) = 1. Eq. (|23|) thus constitutes a 
remarkable scaling relation relating the spectral den- 
sity of sample auto-covariance matrices for processes 
with dynamical correlation to the spectral density of 
sample auto-covariance matrices for i.i.d. sequences of 
random data. 

We have checked our results using simulations of 
sets of 5000 N x N auto-covariance matrices with 
TV = 800. Fig. 1 compares simulations for a se- 
quence of i.i.d. variables with our prediction (|24[) for 
Pa \ and the Marcenko-Pastur law at a = 0.1. Fig. 
2 looks at auto-regressive AR2 processes of the form 
x n + aiXn-i + a2X n -2 = cr£„,_with i.i.d. £„ ~ A/(0, 1), 
and a chosen to ensure that C(0) = (x%) = 1. It com- 
pares simulations with scaling based on our analytic 
approximation (|24|) for p& , and with scaling using an 
empirical scaling function determined via simulation, 
with an a = 0.1-example shown in Fig. 1. 



Whereas scaling appears to be exact using the em- 
pirical scaling function, our analytic result is not. It 
nevertheless produces quantitatively accurate results, 
even for a as large as 0.8. We have elements of an 
independent proof of scaling which we intend to pub- 
lish in a forthcoming paper. Judging from the impact 
that analogous results for (Wishart-Laguerre) sample 
covariance matrices have had, we believe our results to 
hold significant potential for applications in a variety 
of fields, including time-series analysis, information 
theory, or signal processing. 
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